Soft System: identification, visualization and analysis

Analysis of 311 Data

N.B. ArcGIS Map: https://arcg.is/8ifnL

Abstract

The soft system I've chosen to explore in the map linked above and in this notebook is the NYC 311 reporting service. 311 is a 'hotline' that users can dial to submit service requests for government agencies to address. These agencies can then take this data and create work orders for teams to fix the problems raised. This system exists primarily in urban settings and covers non-emergency complaints such as noise, downed trees, and illegally parked cars. One can think of it as a ticketing system that the NYC government uses to receive, track, and prioritize urban issues. 311 reporting is thus a soft system that helps orchestrate the healthy function of NYC municipal services.

It's important to differentiate 311 reporting from the hard systems that it's inherently intertwined with. In the soft systems view, the 311 reporting network is much more fluid and not to be conflated with:

  1. the physical communications network infrastructure (mobile, internet networks) that the system leverages in order to operate
  2. the service requests that relate to networks of hard physical infrastructure: transportation, water & sanitation, buildings etc.

There are 'hard' elements then in 311 operations. But we shall define the overall network as characterized by a set of human processes and interactions that make it decidedly 'soft'.

The nodes of this network are the City agencies and 59 local community boards across the city. It would also be fair to suggest the nodes be the call centers that handle 311 service requests, but that would conflating the system with network #1 above. In addition, these call centers are not the political decision-making units that determine which work orders should be carried out to address concerns. Instead, it is the main responsibility of the community board office to assess the needs of their neighborhoods, receive complaints from residents, and meet with City agencies to make recommendations on actionable items. To this end, Local Law 47 of 2005 requires the Department of Information Technology and Telecommunications (DoITT) to issue monthly reports to community boards and the public.

The agents in this network are the operators that correspond to the nodes described above. Ultimately, the boards serve as advocates but tasks are performed by agency officials and employees. The community boards themselves each consist of up to 50 non-salaried members appointed by the Borough President. And by far the most prevalent agents in this data will be the 311 reporters themselves. The 311 data links agents in these communities through a shared involvement and interest in the wellbeing of the neighborhood. These agents can be any member of the community that resides, works, or has some other significant interest in the area.

The process flow of this network has been hinted above. Over time, service requests flow from neighborhood residents to community boards (through the aforementioned Local Law 47 of 2005), before they're advocated for to city agencies. In turn, there is a flow back into the community of responses and fixes to issues raised.

This analysis only has visibility into the first phase of this flow, and leverages 3 sets of data from NYC Open Data sources:

  1. 311 Service Requests from 2010 to Present: provides 311 reporting as far back as 2010, and is updated on an automated daily basis. Each record in this dataset corresponds to a 311 service request.
  2. NYC Community Boards + Community Districts: (for the ArcGIS map) provides the locations of community boards described as nodes above, as well as neighborhood boundaries.
  3. Agency Service Center: (for the ArcGIS map) provides locations of agency service centers described as nodes above.
  4. 311 Service Level Agreements: Provides the time commitments that City Agencies have made to respond to 311 Service Requests that are assigned to them.

Major takeaways include:

  1. Periodicity: several agencies face 311 calls in seasonal waves. For example, housing preservation and development (HPD) faces most calls in winter months when heat/hot water are front of mind. The NYPD faces most calls in summer months, when most residents are out and about. There are also so peak events corresponding to the ending of COVID-19 lockdowns.
  2. TTR varies by geography: Deeper parts of Brooklyn and Staten Island are more poorly serviced when measured by time to resolution. This can stand in contrast to where the most 311 submissions are actually filed.
  3. SLA breaches: Most agencies outside of the Department for Consumer Affairs and the NYPD are struggling to meet SLA agreements.

SQL Data Mining

311 Service Requests from 2010 to Present

Given the timeframe and nature of this data, you may not be surprised to find that there's 41 columns with 31.4M records of a service request (and counting). Because this dataset will be easily >10GB, it would be computationally infeasible to work with all of the data without chunking / filtering. To avoid abusing my poor little laptop, we'll work directly with the Socrata API and run SQL queries against the database to understand some high level information about the dataset first.

Let's count the number of records grouped by:

  1. Complaint Type: This is the fist level of a hierarchy identifying the topic of the incident or condition. Complaint Type may have a corresponding Descriptor (below) or may stand alone.
  2. Status: Status of SR submitted (Assigned, Cancelled, Closed, Pending, +)
  3. Agency
  4. Borough
  5. Medium: Indicates how the SR was submitted to 311. i.e. By Phone, Online, Mobile, Other or Unknown. Phone - submitted by a 311 call center agent on behalf of a customer. Online - submitted through the 311 mobile app. Other - submitted by another city agency or source. Unknown - unable to determine the source channel of the SR.
  6. Year

Aggregate EDA

How has the medium through which users interact with the 311 system changed over time?

How has the volume of requests fielded by agencies changed over time?

How have the complaint types changed over time? Are there any spikes in volume for specific complaints?

Ingestion + Cleaning

Due to limitations of SoQL and suspicions I have on the data quality of older reporting, for the 2nd part of this analysis, we'll confine ourselves to working with year-to-date data

GeoPandas

At this point, we're going to output to a shapefile as the compression steps below are incompatible with that file format:

Compression

Most of these fields are actually categorical as opposed to free-text

Great! We cut memory usage by ~40%

Granular EDA

What are people complaining about?

Where are complaints being filed from?

See ArcGIS hotspots

When are people complaining?

When, if any, were there spikes in time to resolution for 311 service requests?

How long does it take for agencies to resolve 311 service requests by neighborhood?

Which agencies are not meeting their SLAs?

There are many ways in which this analysis can be brought further:

  1. Agency backlog
  2. Topic modeling / clustering with complaint types and descriptors
  3. SLA / TTR heatmap aggregation on ArcGIS
  4. Spatiotemporal analysis